Systems and methods for intelligent disk rebuild and logical grouping of san storage zones

ABSTRACT

A method of rebuilding a replacement drive used in a RAID group of drives is disclosed. The rebuilding method includes tracking data modification operations continuously during use of the drives. The method also includes saving the tracked data modifications to a log in a persistent storage, where the tracked data modifications are associated with stripe data present on the drives. Then, rebuilding a failed one of the drives with a replacement drive. The rebuilding is facilitated by referencing the log from the persistent storage, and the log facilitating reading only portions of stripe data from surviving drives and omitting reading of portions from the drives where no data was written. Thus, the rebuilding only rebuilds the stripe data to the replacement drive. Also provided is a zoning method, which enables logical zone creation from storage area networks.

CLAIM OF PRIORITY

This application claims the benefit of (1) U.S. Provisional ApplicationNo. 60/947,851, filed on Jul. 3, 2007, and entitled “Systems and Methodsfor Automatic Storage Initiators Grouping in a Multi-Path StorageEnvironment; (2) U.S. Provisional Application No. 60/947,878, filed onJul. 3, 2007, and entitled “Systems and Methods for Server-WideInitiator Grouping in a Multi-Path Storage Environment; (3) U.S.Provisional Patent Application No. 60/947,881, filed on Jul. 3, 2007,and entitled “Systems and Methods for Intelligent Disk Rebuild;” (4)U.S. Provisional Patent Application No. 60/947,884, filed on Jul. 3,2007, and entitled “Systems and Methods for Logical Grouping of SanStorage Zones;” and (5) U.S. Provisional Patent Application No.60/947,886, filed on Jul. 3, 2007, and entitled “Systems and Methods forAutomatic Provisioning of Storage and Operating System Installation,”the disclosures of which are incorporated herein by reference.

FIELD OF THE INVENTION

Embodiments of this invention generally relates to replacing a faileddisk drive that is part of a RAID drive group and rebuilding thereplaced disk drive and creating logical grouping of the SAN Storages.

BACKGROUND OF THE INVENTION

When a drive fails that is part of a RAID 1, RAID 5, or RAID 6 drivegroup, the failing drive should to be replaced. Once the failing driveis replaced, RAID controllers go through a process called rebuild. ForRAID 1, this would involve a copy operation from the surviving drive tothe replaced drive. For RAID 5 and RAID 6, this would involve areconstruction of the data or parity from the surviving drives to thereplaced drive.

Currently storage is allocated from individual storage enclosures. Whenprovisioning the storage in a SAN environment, the user must understandthe location, capabilities, reliability and access control associatedwith each storage enclosure. Therefore, the user needs to keep track ofeach storage enclosure for its location, reliability, capabilities, andaccess control characteristics.

In view of these issues, embodiments of the invention arise.

SUMMARY

Broadly speaking, embodiments of the invention provide methods andsystems for intelligent rebuilding of the replaced disk drive after diskfailure, and creating SAN storage zones to logically group a pluralityof storage devices.

In one embodiment, with the increase in disk drive sizes, rebuild timesare becoming exorbitantly long, taking may hours or days. Long rebuildtimes are a detriment since they impact the overall RAID controllerperformance and in addition leaving user data exposed withoutprotection. If for example a second drive fails while a RAID 5 drivegroup is rebuilding, the drive group will go offline and the data onthat drive group will be lost. Speeding up rebuild times is therefore anessential requirement going forward. In this embodiment, an embodimentto speed up rebuild times is to use a host write tracking persistentlog. The log is configured to keep track of what areas on the disk grouphave been written by the host since the drive group was constructed. Asresult, there is no need to reconstruct an unwritten area since there isno data to reconstruct.

In another embodiment, a method of rebuilding a replacement drive usedin a RAID group of drives is disclosed. The method includes trackingdata modification operations continuously during use of the drives. Themethod also includes saving the tracked data modifications to a log in apersistent storage, where the tracked data modifications are associatedwith stripe data present on the drives. Then, rebuilding a failed one ofthe drives with a replacement drive. The rebuilding is facilitated byreferencing the log from the persistent storage, and the logfacilitating reading only portions of stripe data from surviving drivesand omitting reading of portions from the drives where no data waswritten. Thus, the rebuilding only rebuilds the stripe data to thereplacement drive.

In another embodiment, storage zones are defined. The logical groupingof SAN storage based on location or other characteristics isestablished, instead of based upon individual storage enclosures withina SAN. For example, the storage zone can consist of all the storagelocated within one computer rack, the storage contained within abuilding, or storage with particular characteristics, such asperformance, cost, and reliability. Along these lines, initiatorpermissions are defined for each created storage zone. One benefit ofzoning is it allows for simplified storage administration, simplifiedstorage allocation and/or use. Initiator permissions and policy are thenassociated with storage zones. Thus, SAN storage can be allocated via“logical grouping” and not individual storage enclosures.

In yet another embodiment, a method of creating storage area networkzones is disclosed. The method includes identifying a plurality ofstorage devices. Then, assigning each of the plurality of storagedevices to a logical group, where the logical group being identified bycharacteristics. Then, presenting the plurality of storage devices aspart of the logical group without regard to enclosure identifications.Access and control properties are then assigned to the logical group,which provide access to the plurality of storage devices. Administrationis also now carried out for the logical group, instead of the physicalcharacteristics or individual SANs. Thus, easy SAN grouping can becarried out, where administration is simplified.

Other aspects of the invention will become more apparent from thefollowing detailed description, taken in conjunction with theaccompanying drawings, illustrating by way of example the presentinvention.

BRIEF DESCRIPTION OF THE DRAWINGS

FIGS. 1 and 2 show stripe data tables, illustrating data associated withrebuilding of a replacement disk drive after disk failure, in accordancewith one embodiment of the present invention.

DETAILED DESCRIPTION

Embodiments of the invention provide methods and systems for intelligentrebuilding of the replaced disk drive after disk failure and creatingSAN storage zones to logically group a plurality of storage devices.

In iSCSI (Internet Small Computer Systems Interface) compliant StorageArea Networks, the SCSI commands are sent in IP packets. Use of IPpackets to send SCSI commands to the disk arrays enables implementationof a SAN over an existing Ethernet. Leveraging the IP network forimplementing SAN also permits use of IP and Ethernet features, such assorting out packet routes and alternate paths for sending the packets.

iSCSI is a protocol that allows clients (called initiators) to send SCSIcommands (CDBs) to SCSI storage devices (targets) on remote servers.This Storage Area Network (SAN) protocol allows organizations toconsolidate storage into data center storage arrays while providinghosts (such as database and web servers) with the illusion oflocally-attached disks. Unlike Fibre Channel, which requiresspecial-purpose cabling, iSCSI can be run over long distances usingexisting network infrastructure.

In the iSCSI therefore, there are main functional entities, initiatorsand targets. Initiators are machines that need to access data andtargets are machines that provide the data. A target could be a RAIDarray or another computer system. Targets handle iSCSI requests frominitiators. Target machines may include hot standby machines with“mirrored” storage. If the active machine fails, the standby machinewill take over to provide the iSCSI service, and when the failed machinereturns, the failed machine will re-synchronize with the standby machineand then take back the iSCSI service.

With the increase in disk drive sizes, rebuild times are becomingexorbitantly long taking many hours or days. Long rebuild times are adetriment since they impact the overall RAID controller performance andin addition leave the customers data exposed and possibly not protected.If for example a second drive fails while a RAID 5 drive group isrebuilding, the drive group will go offline and the data on that drivegroup will be lost. Speeding up rebuild times is therefore an essentialrequirement going forward. The embodiments of the present inventiontypically provide a faster rebuild of the replaced drive.

The main performance-limiting issues with disk storage relate to theslow mechanical components that are used for positioning andtransferring data. Since a RAID drive group has many drives in it, anopportunity presents itself to improve performance by using the hardwarein all these drives in parallel. For example, if we need to read a largefile, instead of pulling it all from a single hard disk, it is muchfaster to chop it up into pieces, store some of the pieces on each ofthe drives in the group, and then use all the disks to read back thefile when needed. This technique of chopping up pieces of files iscalled striping.

Striping can be done at the byte level, or in blocks. Byte-levelstriping means that the file is broken into “byte-sized pieces”. Thefirst byte of the file is sent to the first drive, then the second tothe second drive, and so on. Sometimes byte-level striping is done as asector of 512 bytes. Block-level striping means that each file is splitinto blocks of a certain size and those are distributed to the variousdrives. The size of the blocks used is also called the stripe size (orblock size, or several other names), and can be selected from a varietyof choices when the drive group is set up.

The advantages of the present invention are numerous. Most notably, thesystem and methods described herein provides a faster way of rebuildingthe replaced disk in a RAID group by tracking data modificationoperations (or stripping information) (e.g. write, delete, update)continuously and rebuilding the replaced drive by reading only theportions of stripe from one or more surviving disk drives in the RAIDarray.

In one embodiment, the disk rebuild time is enhanced by the use of apersistent write operations tracking module. The persistent writeoperations tracking module keeps track of what areas on the disk grouphave been written by the host since the drive group was constructed. Thetracking information is stored in a persistent tracking log. With theinformation contained in the persistent tracking log, a replaced diskdrive can be rebuilt quickly by selectively reading only parts (e.g.stripping information) of one or more surviving disk drives. There is noneed to reconstruct an unwritten area since there is no data toreconstruct. A simplified example using a RAID 1 drive group is shown inFIG. 1.

The persistent tracking log is used to track the stripes that have beenwritten. FIG. 2 illustrates an example of the persistent tracking log.

When the rebuild algorithm starts, it looks at the persistent log anddetermines which stripes need to be rebuilt. In this example illustratedby FIG. 2, stripes 0, 1, and 3 need to be rebuilt and stripes 2, 4, 5,and 6 do not need to be rebuilt because the “written” flag is “false”,which means that after no data was written in stripes 2, 4, 5, and 6after the disk drive group was constructed or put to work in the RAID.This simple example would result in a >50% increased rebuild time. Thus,a percentage savings can be identified as a function of used and unusedspace on disk drives being rebuilt.

In one embodiment, the persistent tracking log is maintained by the RAIDcontroller. In other embodiment, the persistent tracking log may bemaintained by any component of the computing system to which the RAIDarray is in communication with so long as the persistent tracking logcan be retrieved at a later time to rebuild the replacement drive. Thepersistent tracking log, in one embodiment, is stored in a relationaldatabase. In other embodiment, the persistent tracking log is stored ina non-volatile memory, including a disk drive, ROM, Flash Memory, or anysimilar storage media.

In accordance with another embodiment, methods and systems for creatingSAN storage zones to logically group a plurality of storage devices isprovided. The advantages provided by this embodiment are numerous. Mostnotably, the system and methods described herein eliminate a need forthe user to keep track of the storage characteristics, and location ofeach individual storage enclosure.

Instead, a logical group consisting of a plurality of storage enclosuresthat may be located at different locations and having different storagecharacteristics is created. The logical group of storage enclosures isthen made available as a single storage enclosure to the user. Theadministrator of the logical group may modify the characteristics of thelogical group by adding or removing one or more storage enclosures,changing locations of the one or more storage enclosures in a logicalgroup.

In one embodiment, the storage enclosures in a logical group are hiddenfrom the user. Hence, any change (e.g., adding or removing enclosures,changing location, etc.) in the structure of logical groups does notaffect overall system configuration and usage. Therefore, the logicalgrouping of the storage enclosures simplifies the management of theStorage Area Network (SAN) and permits efficient storage, configurationand privilege management.

With the creation of the storage zone, i.e., the logical grouping of thestorage enclosures, SAN storage is no longer viewed at the enclosurelevel. The storage enclosures are logically grouped together to meetcustomers' unique requirements for administrating, provisioning, andusage of the storage enclosures.

The storage administrator defines the storage zone by creating a logicalgroup and adding the selected storage enclosures to the local group. Theaccess control properties are then defined and permissions to individualstorage initiators e.g., iSCSI (Internet Small Computer SystemsInterface), Fibre Channel (FC), SAS, etc. Initiator permissions can beunique for each initiator within a storage zone. In one embodiment,logical groups of initiators can also be defined and added to aparticular storage zone.

In one embodiment, the SAN administrator(s) defines grouping propertiesfor each of the physical and logical storage coupled to the SANappliances. The SAN appliance as described herein a box including slotsfor a plurality of server blades, RAID disk arrays, and SAN control andmanagement software to control and manage the server blades, RAID, databuses, and other necessary components of the SAN. The properties mayinclude location of the storage, names of special characteristics,capabilities, and type of the storage. In one embodiment, each propertyin the properties is structured in a tree structure format. For example,under a “Location” named node in the property tree structure, a nodnamed “Building 23” is created. Under the “Building 23” node, a childnode named “Server Room A” may be created. More sibling and child nodesmay be created to properly identify a location. The properties may bestored anywhere in the SAN so long as the appliance in which the zonegrouping is being created may read the properties.

One or more zone grouping rules are then created and stored in the SAN.The zone grouping rule may define a set of properties that if matchedwould trigger creation of a zone group. A zone grouping rule may be setto be active or inactive. The appliance discovers all the storages thatare coupled to the appliance and retrieves the properties associatedwith each of the storage. Further, based on one or more active zonegrouping rules, the appliance attempt to match the properties of thestorages. If a matching rule is satisfied, the appliance creates a zonegroup of the storages that provides matching properties as defined byone or more zone grouping rule. The zone groups are then permanentlystored in the appliance. The SAN administrator may edit the zone groupsif a change in the group is necessary.

A set of default group properties is provided. One or more default groupproperties are attached to a newly created zone group. The zone grouprule would include which default group proprieties are to be used for anewly created group. The group properties may include permissions andprivilege grants to one or more storage initiators.

In one embodiment, storage zones may be created by grouping the storageenclosures based on a location. In another embodiment, storage zones maybe created by grouping the storage enclosures based on reliabilitycharacteristics of the storage enclosures. In yet another embodiment, azone group may be created based on any physical or logicalcharacteristics so long as the physical or logical characteristics isdefined in the property of the storage enclosures and one or more zonegroup rules are defined to use the physical or logical characteristicsto create zone groups.

By providing a layer of abstraction over the storage initiators andstorage enclosures, initiator storage allocation does not requireinvolvement of the Storage Area Network (SAN) administrator. The storageinitiators work with the storage zones and not with the physical storageenclosures. Furthermore, more storage enclosures can be seamlessly addedto a storage zone without impacting availability of storage interface tothe initiators of users and without a need to create access controlproperties for the newly added storage enclosure. Similarly, new storageinitiators may be added to a storage zone without impacting the usage ofthe physical storage enclosures in the storage zone.

Since from usage view point, a storage zone is treated same as aphysical storage enclosures, a unique set of permission may beassociated with the storage zone, similar to associating access controlproperties to a physical storage enclosure. Therefore, the logicalgrouping of SAN storage greatly simplified the administration and use ofthe storage enclosures.

With the above embodiments in mind, it should be understood that theinvention may employ various hardware and software implementedoperations involving data stored in computer systems. These operationsare those requiring physical manipulation of physical quantities.Usually, though not necessarily, these quantities take the form ofelectrical or magnetic signals capable of being stored, transferred,combined, compared, and otherwise manipulated. Further, themanipulations performed are often referred to in terms, such asproducing, identifying, determining, or comparing.

Any of the operations described herein that form part of the inventionare useful machine operations. The invention also relates to a device oran apparatus for performing these operations. The apparatus may bespecially constructed for the required purposes, such as the carriernetwork discussed above, or it may be a general purpose computerselectively activated or configured by a computer program stored in thecomputer. In particular, various general purpose machines may be usedwith computer programs written in accordance with the teachings herein,or it may be more convenient to construct a more specialized apparatusto perform the required operations.

The programming modules, page modules, and, subsystems described in thisdocument can be implemented using a programming language such as Flash,JAVA, C++, C, C#, Visual Basic, JAVA Script, PHP, XML, HTML etc., or acombination of programming languages. Commonly available applicationprogramming interface (API) such as HTTP API, XML API and parsers etc.are used in the implementation of the programming modules. As would beknown to those skilled in the art that the components and functionalitydescribed above and elsewhere in this document may be implemented on anydesktop operating system which provides a support for a display screen,such as different versions of Microsoft Windows, Apple Mac,Unix/X-Windows, Linux etc. using any programming language suitable fordesktop software development.

The programming modules and ancillary software components, includingconfiguration file or files, along with setup files required forinstalling and related functionality as described in this document, arestored on a computer readable medium. Any computer medium such as aflash drive, a CD-ROM disk, an optical disk, a floppy disk, a harddrive, a shared drive, and an storage suitable for providing downloadsfrom connected computers, could be used for storing the programmingmodules and ancillary software components. It would be known to a personskilled in the art that any storage medium could be used for storingthese software components so long as the storage medium can be read by acomputer system.

The invention may be practiced with other computer system configurationsincluding hand-held devices, microprocessor systems,microprocessor-based or programmable consumer electronics,minicomputers, mainframe computers and the like. The invention may alsobe practiced in distributing computing environments where tasks areperformed by remote processing devices that are linked through anetwork.

As used herein, a storage area network (SAN) is an architecture toattach remote computer storage devices (such as disk arrays, tapelibraries and optical jukeboxes) to servers in such a way that, to theoperating system, the devices appear as locally attached.

The invention can also be embodied as computer readable code on acomputer readable medium. The computer readable medium is any datastorage device that can store data, which can thereafter be read by acomputer system. Examples of the computer readable medium include harddrives, network attached storage (NAS), read-only memory, random-accessmemory, CD-ROMs, CD-Rs, CD-RWs, DVDs, Flash, magnetic tapes, and otheroptical and non-optical data storage devices. The computer readablemedium can also be distributed over a network coupled computer systemsso that the computer readable code is stored and executed in adistributed fashion.

While this invention has been described in terms of several preferableembodiments, it will be appreciated that those skilled in the art uponreading the specifications and studying the drawings will realizevarious alternation, additions, permutations and equivalents thereof. Itis therefore intended that the present invention includes all suchalterations, additions, permutations, and equivalents as fall within thetrue spirit and scope of the claims.

1. A method of rebuilding a replacement drive used in a RAID group ofdrives, comprising: tracking data modification operations continuouslyduring use of the drives; saving the tracked data modifications to a login a persistent storage, the tracked data modifications being associatedwith stripe data present on the drives; and rebuilding a failed one ofthe drives with a replacement drive, the rebuilding being facilitated byreferencing the log from the persistent storage, and the logfacilitating reading only portions of stripe data from surviving drivesand omitting reading of portions from the drives where no data waswritten, so that the rebuilding only rebuilds the stripe data to thereplacement drive.
 2. The method of rebuilding a replacement drive asrecited in claim 1, wherein RAID level-5 writes data in stripes acrossmultiple drives.
 3. The method of rebuilding a replacement drive asrecited in claim 1, wherein the replacement drive is rebuilt using thestripe data present on surviving drives that did not experience afailure, and the replacement drive completes the RAID group of drives.4. The method of rebuilding a replacement drive as recited in claim 1,wherein modification operation include one or more of write operations,delete operations, or update operations.
 5. The method of rebuilding areplacement drive as recited in claim 1, wherein the log identifiesparticular stripes to rebuild.
 6. The method of rebuilding a replacementdrive as recited in claim 6, wherein the log provides flags identifyingwritten or no data.
 7. The method of rebuilding a replacement drive asrecited in claim 6, wherein rebuild time is reduced as a percentage ofamount of stripes not requiring rebuild.
 8. The method of rebuilding areplacement drive as recited in claim 1, wherein the log is stored in arelational database, a disk drive, a ROM, or a Flash Memory.
 9. A methodof creating storage area network zones, comprising: identifying aplurality of storage devices; assigning each of the plurality of storagedevices to a logical group, the logical group being identified bycharacteristics; presenting the plurality of storage devices as part ofthe logical group without regard to enclosure identifications; assigningaccess control properties to the logical group, which provide access tothe plurality of storage devices.
 10. A method of creating storage areanetwork zones as recited in claim 9, wherein one or more grouping rulesare created and stored in a storage area network zone.
 11. A method ofcreating storage area network zones as recited in claim 10, furthercomprising: discovering each storage device in the zone; and retrievingproperties of each storage.
 12. A method of creating storage areanetwork zones as recited in claim 10, wherein the characteristicsinclude one or more of location, name, purpose, physical attribute, orlogical attribute.